NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A Textbook Remedy for Domain Shifts: Knowledge Priors for Medical Image Analysis

Yang, Yue; Gandhi, Mona; Wang, Yufei; Wu, Yifan; Yao, Michael S; Callison-Burch, Chris; Gee, James C; Yatskar, Mark (December 2024, NeurIPS)

While deep networks have achieved broad success in analyzing natural images, when applied to medical scans, they often fail in unexpected situations. This study investigates model sensitivity to domain shifts, such as data sampled from different hospitals or confounded by demographic variables like sex and race, focusing on chest X-rays and skin lesion images. The key finding is that existing visual backbones lack an appropriate prior for reliable generalization in these settings. Inspired by medical training, the authors propose incorporating explicit medical knowledge communicated in natural language into deep networks. They introduce Knowledge-enhanced Bottlenecks (KnoBo), a class of concept bottleneck models that integrate knowledge priors, enabling reasoning with clinically relevant factors found in medical textbooks or PubMed. KnoBo utilizes retrieval-augmented language models to design an appropriate concept space, paired with an automatic training procedure for recognizing these concepts. Evaluations across 20 datasets demonstrate that KnoBo outperforms fine-tuned models on confounded datasets by 32.4% on average. Additionally, PubMed is identified as a promising resource for enhancing model robustness to domain shifts, outperforming other resources in both information diversity and prediction performance.
more » « less
Full Text Available
Visualizing the Obvious: A Concreteness-based Ensemble Model for Noun Property Prediction

Yang, Yue; Panagopoulou, Artemis; Apidianaki, Marianna; Yatskar, Mark; Callison-Burch, Chris (December 2022, Findings of The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022))

Neural language models encode rich knowledge about entities and their relationships which can be extracted from their representations using probing. Common properties of nouns (e.g., red strawberries, small ant) are, however, more challenging to extract compared to other types of knowledge because they are rarely explicitly stated in texts. We hypothesize this to mainly be the case for perceptual properties which are obvious to the participants in the communication. We propose to extract these properties from images and use them in an ensemble model, in order to complement the information that is extracted from language models. We consider perceptual properties to be more concrete than abstract properties (e.g., interesting, flawless). We propose to use the adjectives’ concreteness score as a lever to calibrate the contribution of each source (text vs. images). We evaluate our ensemble model in a ranking task where the actual properties of a noun need to be ranked higher than other non-relevant properties. Our results show that the proposed combination of text and images greatly improves noun property prediction compared to powerful text-based language models.
more » « less
Full Text Available
Visual Goal-Step Inference using wikiHow

https://doi.org/10.18653/v1/2021.emnlp-main.165

Yang, Yue; Panagopoulou, Artemis; Lyu, Qing; Zhang, Li; Yatskar, Mark; Callison-Burch, Chris (January 2021, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing)

Understanding what sequence of steps are needed to complete a goal can help artificial intelligence systems reason about human activities. Past work in NLP has examined the task of goal-step inference for text. We introduce the visual analogue. We propose the Visual Goal-Step Inference (VGSI) task, where a model is given a textual goal and must choose which of four images represents a plausible step towards that goal. With a new dataset harvested from wikiHow consisting of 772,277 images representing human actions, we show that our task is challenging for state-of-the-art multimodal models. Moreover, the multimodal representation learned from our data can be effectively transferred to other datasets like HowTo100m, increasing the VGSI accuracy by 15 - 20%. Our task will facilitate multimodal reasoning about procedural events.
more » « less
Full Text Available
Gender Bias in Contextualized Word Embeddings

https://doi.org/10.18653/v1/N19-1064

Zhao, Jieyu; Wang, Tianlu; Yatskar, Mark; Cotterell, Ryan; Ordonez, Vicente; Chang, Kai-Wei (January 2019, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies)

In this paper, we quantify, analyze and mitigate gender bias exhibited in ELMo’s contextualized word vectors. First, we conduct several intrinsic analyses and find that (1) training data for ELMo contains significantly more male than female entities, (2) the trained ELMo embeddings systematically encode gender information and (3) ELMo unequally encodes gender information about male and female entities. Then, we show that a state-of-the-art coreference system that depends on ELMo inherits its bias and demonstrates significant bias on the WinoBias probing corpus. Finally, we explore two methods to mitigate such gender bias and show that the bias demonstrated on WinoBias can be eliminated.
more » « less
Full Text Available
Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods

https://doi.org/10.18653/v1/N18-2003

Zhao, Jieyu; Wang, Tianlu; Yatskar, Mark; Ordonez, Vicente; Chang, Kai-Wei (January 2018, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies)

Full Text Available
Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints

https://doi.org/10.18653/v1/D17-1323

Zhao, Jieyu; Wang, Tianlu; Yatskar, Mark; Ordonez, Vicente; Chang, Kai-Wei (September 2017, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing)

Language is increasingly being used to define rich visual recognition problems with supporting image collections sourced from the web. Structured prediction models are used in these tasks to take advantage of correlations between co-occurring labels and visual input but risk inadvertently encoding social biases found in web corpora. In this work, we study data and models associated with multilabel object classification and visual semantic role labeling. We find that (a) datasets for these tasks contain significant gender bias and (b) models trained on these datasets further amplify existing bias. For example, the activity cooking is over 33% more likely to involve females than males in a training set, and a trained model further amplifies the disparity to 68% at test time. We propose to inject corpus-level constraints for calibrating existing structured prediction models and design an algorithm based on Lagrangian relaxation for collective inference. Our method results in almost no performance loss for the underlying recognition task but decreases the magnitude of bias amplification by 47.5% and 40.5% for multilabel classification and visual semantic role labeling, respectively
more » « less
Full Text Available

Search for: All records